具有多个层次结构的推理能力是用于自然语言处理的连续感应偏差的有吸引力和理想的特性。最先进的变形金刚和LSTM架构隐含地编码这些偏差吗?要回答这一点,我们提出了一个诊断数据集,用于系统地评估最先进的神经序列模型中的分层推理。虽然已经有先前的评估框架如列表或逻辑推断,但我们的工作提出了一种新颖且更自然的环境,其中我们的模型学会使用多个显式层次结构而不是一个,即需要执行长期执行能力的原因序列记忆,关系推理的序列结构。因此,由一组严谨的实验支持,我们展示了(1)变压器和LSTM模型在系统泛化中令人惊讶地失败,(2)在层次结构之间增加的引用,变压器不会比随机更好地执行。
translated by 谷歌翻译
诱导顺序数据的潜在树结构是今天NLP研究景观的新出现趋势,主要是由最近的方法(如Gumbel LSTM和有序神经元)(LSTM)所普及。本文提出了Fasttrees,一种新的通用神经模块,用于快速序列编码。与最先前的作品不同,考虑到树归类所需的复发,我们的工作探讨了并行树归纳的概念,即,通过分层电感偏置的并行,非自动增加时尚的分层感应偏差。为此,我们提出的Fasttrees在四个建立良好的序列建模任务中实现了对LSTM的竞争或卓越的性能,即语言建模,逻辑推断,情感分析和自然语言推断。此外,我们表明FastTrees模块可以应用于增强变压器模型,实现三个序列转换任务(机器翻译,主语 - 动词协议和数学语言理解)实现性能增益,为模块化树感应模块铺平了道路。总的来说,我们以+ 4%的逻辑推理任务和数学语言理解+ 8%的现有最先进的模型。
translated by 谷歌翻译
预处理的基于变压器的语言模型(LMS)显示出显着的自然语言生成能力。凭借其巨大的潜力,控制这种LM的文本生成引起了人们的关注。尽管有一些研究试图控制生成的文本的高级属性(例如情感和主题),但仍然缺乏对其在单词和短语级别上的内容的更精确的控制。在这里,我们建议内容调节器(COCON)以细粒度的水平控制LM的输出文本。在我们的自我监督方法中,Cocon Block学会了通过调节从LM中扣留的内容输入来帮助LM完成部分观察到的文本序列。通过实验,我们表明Cocon可以自然地将目标内容纳入生成的文本中,并以零拍的方式控制高级文本属性。
translated by 谷歌翻译
An oft-cited open problem of federated learning is the existence of data heterogeneity at the clients. One pathway to understanding the drastic accuracy drop in federated learning is by scrutinizing the behavior of the clients' deep models on data with different levels of "difficulty", which has been left unaddressed. In this paper, we investigate a different and rarely studied dimension of FL: ordered learning. Specifically, we aim to investigate how ordered learning principles can contribute to alleviating the heterogeneity effects in FL. We present theoretical analysis and conduct extensive empirical studies on the efficacy of orderings spanning three kinds of learning: curriculum, anti-curriculum, and random curriculum. We find that curriculum learning largely alleviates non-IIDness. Interestingly, the more disparate the data distributions across clients the more they benefit from ordered learning. We provide analysis explaining this phenomenon, specifically indicating how curriculum training appears to make the objective landscape progressively less convex, suggesting fast converging iterations at the beginning of the training procedure. We derive quantitative results of convergence for both convex and nonconvex objectives by modeling the curriculum training on federated devices as local SGD with locally biased stochastic gradients. Also, inspired by ordered learning, we propose a novel client selection technique that benefits from the real-world disparity in the clients. Our proposed approach to client selection has a synergic effect when applied together with ordered learning in FL.
translated by 谷歌翻译
Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics are disregarded and the person is treated as a body or a collection of body parts. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that, commensurate with prior research in psychology, human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >.8) and sadness (d >.5). A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50% as often for images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of "a [age] year old girl" generates sexualized images (as determined by an NSFW classifier) up to 73% of the time for VQGAN-CLIP (age 17), and up to 40% of the time for Stable Diffusion (ages 14 and 18); the corresponding rate for boys never surpasses 9%. The evidence indicates that language-vision AI models trained on automatically collected web scrapes learn biases of sexual objectification, which propagate to downstream applications.
translated by 谷歌翻译
Pre-trained language models have been successful in natural language generation (NLG) tasks. While various decoding methods have been employed, they often produce suboptimal results. We first present an empirical analysis of three NLG tasks: summarization, machine translation, and constrained text generation. We found that selecting the best output from the results of multiple decoding methods can significantly improve performance. To further improve reranking for NLG tasks, we proposed a novel method, \textsc{PairReranker}, which uses a single encoder and a pairwise loss function to jointly encode a source input and a pair of candidates and compare them. Experiments on three NLG tasks demonstrated the effectiveness and flexibility of \textsc{PairReranker}, showing strong results, compared with previous baselines. In addition, our \textsc{PairReranker} can generalize to significantly improve GPT-3 (text-davinci-003) results (e.g., 24.55\% on CommonGen and 11.35\% on WMT18 zh-en), even though our rerankers are not trained with any GPT-3 candidates.
translated by 谷歌翻译
Large pre-trained language models have recently enabled open-ended generation frameworks (e.g., prompt-to-text NLG) to tackle a variety of tasks going beyond the traditional data-to-text generation. While this framework is more general, it is under-specified and often leads to a lack of controllability restricting their real-world usage. We propose a new grounded keys-to-text generation task: the task is to generate a factual description about an entity given a set of guiding keys, and grounding passages. To address this task, we introduce a new dataset, called EntDeGen. Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions. Our EntDescriptor model is equipped with strong rankers to fetch helpful passages and generate entity descriptions. Experimental result shows a good correlation (60.14) between our proposed metric and human judgments of factuality. Our rankers significantly improved the factual correctness of generated descriptions (15.95% and 34.51% relative gains in recall and precision). Finally, our ablation study highlights the benefit of combining keys and groundings.
translated by 谷歌翻译
We study critical systems that allocate scarce resources to satisfy basic needs, such as homeless services that provide housing. These systems often support communities disproportionately affected by systemic racial, gender, or other injustices, so it is crucial to design these systems with fairness considerations in mind. To address this problem, we propose a framework for evaluating fairness in contextual resource allocation systems that is inspired by fairness metrics in machine learning. This framework can be applied to evaluate the fairness properties of a historical policy, as well as to impose constraints in the design of new (counterfactual) allocation policies. Our work culminates with a set of incompatibility results that investigate the interplay between the different fairness metrics we propose. Notably, we demonstrate that: 1) fairness in allocation and fairness in outcomes are usually incompatible; 2) policies that prioritize based on a vulnerability score will usually result in unequal outcomes across groups, even if the score is perfectly calibrated; 3) policies using contextual information beyond what is needed to characterize baseline risk and treatment effects can be fairer in their outcomes than those using just baseline risk and treatment effects; and 4) policies using group status in addition to baseline risk and treatment effects are as fair as possible given all available information. Our framework can help guide the discussion among stakeholders in deciding which fairness metrics to impose when allocating scarce resources.
translated by 谷歌翻译
聚集的联合学习(FL)已显示通过将客户分组为群集,从而产生有希望的结果。这在单独的客户群在其本地数据的分布方面有显着差异的情况下特别有效。现有的集群FL算法实质上是在试图将客户群体组合在一起,以便同一集群中的客户可以利用彼此的数据来更好地执行联合学习。但是,先前的群集FL算法试图在培训期间间接学习这些分布相似性,这可能会很耗时,因为可能需要许多回合的联合学习,直到群集的形成稳定为止。在本文中,我们提出了一种新的联合学习方法,该方法直接旨在通过分析客户数据子空间之间的主要角度来有效地识别客户之间的分布相似性。每个客户端都以单一的方式在其本地数据上应用截断的奇异值分解(SVD)步骤,以得出一小部分主向量,该量提供了一个签名,可简洁地捕获基础分布的主要特征。提供了一组主要的主向量,以便服务器可以直接识别客户端之间的分布相似性以形成簇。这是通过比较这些主要向量跨越的客户数据子空间之间主要角度的相似性来实现的。该方法提供了一个简单而有效的集群FL框架,该框架解决了广泛的数据异质性问题,而不是标签偏斜的更简单的非iids形式。我们的聚类FL方法还可以为非凸目标目标提供融合保证。我们的代码可在https://github.com/mmorafah/pacfl上找到。
translated by 谷歌翻译
语言模型(LMS)被证明具有对物理世界的常识知识,这对于在日常情况下完成任务至关重要。但是,LMS是否有能力为具体任务生成扎根的可执行计划,这仍然是一个悬而未决的问题。这是非常具有挑战性的,因为LMS没有“眼睛”或“手”来感知现实的环境。在这项工作中,我们展示了有关这个重要研究问题的第一个研究。我们首先提出了一个名为G-Planet的新型问题公式,它将其作为输入一个高级目标和在特定环境中的对象表。预期输出是一个计划,该计划包括逐步指令供代理执行。为了实现此问题的研究,我们建立了一个评估协议,并设计了一个专门的指标来评估计划的质量。在我们的广泛实验中,我们表明,为编码环境添加扁平表并使用迭代解码策略都可以提高LMS的基础计划能力。我们对结果的分析也导致有趣的非平凡发现。
translated by 谷歌翻译